239 research outputs found
An Efficient Distribution of Labor in a Two Stage Robust Interpretation Process
Although Minimum Distance Parsing (MDP) offers a theoretically attractive
solution to the problem of extragrammaticality, it is often computationally
infeasible in large scale practical applications. In this paper we present an
alternative approach where the labor is distributed between a more restrictive
partial parser and a repair module. Though two stage approaches have grown in
popularity in recent years because of their efficiency, they have done so at
the cost of requiring hand coded repair heuristics. In contrast, our two stage
approach does not require any hand coded knowledge sources dedicated to repair,
thus making it possible to achieve a similar run time advantage over MDP
without losing the quality of domain independence.Comment: 9 pages, 1 Postscript figure, uses aclap.sty and psfig.tex, In
Proceedings of EMNLP 199
Towards Multilingual Automatic Dialogue Evaluation
The main limiting factor in the development of robust multilingual dialogue
evaluation metrics is the lack of multilingual data and the limited
availability of open sourced multilingual dialogue systems. In this work, we
propose a workaround for this lack of data by leveraging a strong multilingual
pretrained LLM and augmenting existing English dialogue data using Machine
Translation. We empirically show that the naive approach of finetuning a
pretrained multilingual encoder model with translated data is insufficient to
outperform the strong baseline of finetuning a multilingual model with only
source data. Instead, the best approach consists in the careful curation of
translated data using MT Quality Estimation metrics, excluding low quality
translations that hinder its performance.Comment: SIGDIAL2
Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation
Despite significant research effort in the development of automatic dialogue
evaluation metrics, little thought is given to evaluating dialogues other than
in English. At the same time, ensuring metrics are invariant to semantically
similar responses is also an overlooked topic. In order to achieve the desired
properties of robustness and multilinguality for dialogue evaluation metrics,
we propose a novel framework that takes advantage of the strengths of current
evaluation models with the newly-established paradigm of prompting Large
Language Models (LLMs). Empirical results show our framework achieves state of
the art results in terms of mean Spearman correlation scores across several
benchmarks and ranks first place on both the Robust and Multilingual tasks of
the DSTC11 Track 4 "Automatic Evaluation Metrics for Open-Domain Dialogue
Systems", proving the evaluation capabilities of prompted LLMs.Comment: DSTC11 best paper for Track
- …